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Abstract 

Recently Yoffe et al. observed that the average distances between 5'-3' ends 
of RNA molecules are very small and largely independent of sequence length. This 
observation is based on numerical computations as well as theoretical arguments 
maximizing certain entropy functionals. In this paper we compute the exact distri- 
bution of 5'-3' distances of RNA secondary structures for any finite n. We further- 
more compute the limit distribution and show that already for n = 30 the exact 
distribution and the limit distribution are very close. Our results show that the 
distances of random RNA secondary structures are distinctively lower than those of 

minimum free energy structures of random RNA sequences. 
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1. Introduction and background 



The closeness of 5' and 3' ends of RNA molecules has distinct biological significance, 
for instance for the replication efficiency of single stranded RNA vi ruses or the 



efficie nt translation of messenger RNA molecules. It is speculated in (lYoffe et al. 



201ll ) that this effective circularization of large RNA molecules is rather a generic 
phenomenon of large RNA molecules and independent of sequence length. It is to 
large extend attributed to the high number of paired bases. 



In this paper we study the distribution of 5'-3' distances in RNA secondary struc- 
tures. We first compute the distribution of 5'-3' distances of RNA secondary struc- 
tures of length n by means of a bivariate generating function. The key idea is to 
view secondary struct ures as tableaux sequen ces and to relate the 5'-3' distance to 



the nontrivial returns ( jJin and Reidys 



2010bl ) of the corresponding path of shapes. 



Secondly, we derive the limit distribution of 5'-3' distances. The idea is to compute 
the singular expansion of theabove generating function via the subcritical paradigm 



(IFlajolet and Sedgewick 



20091 ) and to employ a discrete limit theorem. 
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Our results prove, that the 5'-3' distances of random RNA structures are distinctively 
smaller than those of biological RNA molecules and minimum free energy (mfe) 
RNA structures. This com es as a surpri se since the number of paired bases in 



mfe structures (iFontana et al. 



random structu res is 55.2% fjReidvsl . 



20 111 ) and therefore smaller than the 60% of 



An RNA structure is the helical configuration of its primary sequence, i.e. the 
sequence of nucleotides A, G, U and C, together with Watson-Crick (A-U, G- 
C) and (U-G) base pairs. The combinatorics of RNA secondary structures has 



been pioneered by Waterman ( 



Howell et al. 



1980 



Penner and Waterman 



Waterman and Schmitt 



1993 



Waterman . 



1978 



1979 



19941 ) . We interpret an RNA secondary 



structure as a diagram, i.e. labeled graphs over the vertex set [n] = {1, . . . ,n}, rep- 
resented by drawing its vertices 1, . . . ,n in a horizontal line and connecting them 
via the set of backbone-edges + 1)' | 1 < i < n — 1}. Besides its backbone 
edges a diagram exhibits arcs, that are drawn in the upper half-plane. Note 

that an arc of the form + 1) or 1-arc, is distinguished from the backbone edge 
{i,i + iy. However, no confusion can arise since an RNA secondary structure is a di- 
agram having no 1-arcs and only noncrossing arcs in the upper half-plane, see Fig. [H 



The 5'-3' distance of an RNA secondary structure is the minimal length of a path 
of the diagram. Such a diagram-path is comprised of arcs and backbone-edges, see 
Fig.H 
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The paper is organized as follows: In Section [2] we discuss some basic facts, in par- 
ticular the structure-tableaux correspondence and how to express the 5'-3' distance 
via such tableaux-sequences. In Section [3] we compute W{z,u), the bivariate gener- 
ating function of RNA secondary structures of length n having distance d. Section H] 
contains the computation of the singular expansion of W{z,u) and in Section [S] we 
combine our results and derive the limit distribution. We finally discuss our results 
in Section [61 



2. Preliminaries 



Let denote the set of RNA secondary structures of length n, an- AH results 
of this paper easily generalize to the case of diagrams with noncrossing arcs that 
contain no arc s of length smaller than A > 1 and to canonical secondary structures 



(IReidys 



201ll ). i.e. structures that contain no isolated arcs. 



The distance of an, dn{an), is the minimum length of a path consisting of a-arcs 
and backbone-edges from vertex 1 (the 5' end) to vertex n (the S'-end). That is we 
have the mapping dn : =5^„ — > N. 



A sequence of shapes (Aq, Ai, . . . , A„) is called a 1-tableaux of length n, Tn, if all 
shapes contain only one row of squares and (a) Aq = A„ = 0, (b) Aj+i is obtained 
from Aj by adding a square (+□), removing a square (— □) or doing nothing (0) 
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and (c) there exists no sequence of (+□, — □)-steps. Let S^n denote the set of all 
1-tableaux of length n. 



We come next to the tableaux interpretation of se condary structures . The 



corre spondence is an immediate consequence of (IChen et al. 



2008 



2007 



underlying 



Jin et al. 



20081 ) ■ We shall subsequently express the 5'-3' distance via 1-tableaux. 



Proposition 1. ( fjm et alJ . \2008i) There exists a bijection between RNA secondary 
structures and 1-tableaux: 



(2.1] 



Proof. Given (T„, we consider the sequence (n, n — 1, . . . , 1) and, starting with 0, do 
the following: 

• if j is the endpoint of an arc we add one square, 

• if j is the start point of an arc [j, s), we remove one square, 

• if j is an isolated point, we do nothing. 

This constructs a 1-tableaux of length n and thus defines the map Conversely, 
given a 1-tableau T„, (0, A^, . . . , A*^"^, 0), reading A*\A*~^ from left to right, at step 
i, we do the following: 

• for a +n-step at i we insert i into the new square, 

• for a 0-step we do nothing, 

• for a — D-step at i we extract the entry of the rightmost square The latter 
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extractions generate the arc-set | i is a — D-step} that contains by defini- 

tion of T„ no 1-arcs. Thus this procedure generates a secondary structure of length 
n without 1-arc, which, by construction, is the inverse of /3„ and the proposition 



A secondary structure (j„ is irreducible if is a sequence of shapes (Aq, . . . , A„) 

such that Xj ^ ioi 1 < j < n. An irreducible substructure of cr„ is a subsequence 
(Aj, . . . , Aj_|_fc) such that Ai_i = and Xi^k = ^ and Aj 7^ for ? < j < i + k. 
In the following we denote the terminal shapes (Aj+A.) of non-rightmost irreducibles 
by 0* and the terminal shape of the rightmost irreducible by 0*. Accordingly we 
distinguish three types of shapes 0, 0* and 0*. We can now express the distance 
in terms of numbers of 0* and shapes as follows 



Let w(?7,, d) denote the number of RNA secondary structures cr^ having distance d, 
In the following we shall write d instead of dn and consider 



follows. 



□ 



(2.2) 



dnM = 2 1{0* e /3M}\ + |{0 e /3K)}|. 



3. Combinatorial analysis 



(3.1) 




n>0 d>0 
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the bivariate generating function of the number of RNA secondary structure of 
length n having distance d and set w(n) = X]d>o '^('^' '^)- ^{^) denote the 
generating function of RNA secondary structures and Irr(z) denote the generating 
function of irreducible secondary structures (irreducibles) . Let furthermore =5^„ de- 
note the set of secondary structures of length n and J^n denote the set of irreducible 
structures of length n. 



Theorem 1. The bivariate generating function of the number of RNA secondary 
structures of length n with distance d, is given by 

, . , . uz^{^(z) - I) z 

[6.2] \\[z,u) 



{I- zuf - {I- zu){zuY{S{z) -I) 1-zu 



Proof. We set V(z, u) = z/{l — zu) and \J{z, u) = W(z, u) — \{z, u). 
Claim 1: Ivy{z) = z^ (S(z) - 1). 

To prove Claim 1 we consider the mapping 7 : J^^ — > •^n-i^ obtained by removing 
the shapes Ai and A„_i from /3(o"„) and removing the rightmost box from all other 
shapes Aj, 2 < j < n — 2. Note that for 1 = {n — \) the tableaux /^((Tn) corresponds to 
a 1-arc which is impossible. Hence for an irreducible structure Ai = □ and An_i = □ 
are distinct shapes and the induced sequence of shapes /i = (Aq, A2 \ . . . , An_2 \ 
□ ,A„,) is again a 1-tableaux, i.e. an element of S^n-2i where A^ \ □ denotes the 
shape \j with the rightmost □ deleted. Thus 7 is welldefined. Given a 1-tableaux 
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r = (Ao, . . . , An- 2) we consider the map 

(3.3) 7*(r) = (Ao, □, Ai U □, . . . , A„_3 U □, □, Xn-2) 
where Xj U □ denotes the shape Xj with a □ added, see Fig. [3 

By construction, 7* 07 = id, whence Claim 1. Let us first compute the contribution 
of secondary structures containing at least one irreducible. 

Claim 2: Suppose an has distance d, then (i + 1) irreducibles can be arranged in 
exactly (f;^^') ways. 

Indeed, in view oi d = 2 |{0* G /3(cr„,)}| + |{0 G /3(cr„,)}|, the distance-contribution 
of the rightmost irreducible and each isolated point is one, while the contribution of 
all remaining i irreducibles equals two. No two such contributions overlap, whence 
replacing by — z we have (f^Tj') ways to place the (^ + 1) irreducibles and Claim 
2 follows. Accordingly, we obtain for fixed d 

(3.4) ^ u(n, d)z" = ^(^~ \rv{zy+^z''-^'-\ 

n>d i>0 ^ ^ 

where the indeterminant z corresponds to the isolated points and Irr(^) represents 
the irreducible structures labeled by the 0* and 0K Consequently, rearranging 
terms we derive 

(3.5) U(z, n) = J2Y. "(^' ^)^'^^' = E E 

d>l n>d i>0 d>l 
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and therefore 



i>0 d>l 

(3.6) - - 



i>0 d>l ^ ^ 



j>0 d>l 



Using X;r>o (D = ' ^ > 0, we compute 



j>0 

1 2;Mlrrfz) 



_ uz'^i^jz) - 1) 

~ (1 - zuY - (1 - zu)z'^u\^{z) - !)■ 

It remains to consider RNA secondary structures that contain no irreducibles, i.e. RNA 
secondary structures consisting exclusively of isolated vertices. Clearly, 



(3.7) V(z,n) = ^^V"-i = - 



zu 

n>l 



and the proof of the theorem is complete. □ 



Setting p(n, c?) = w(n, (i)/w(n), Theorem [T] provides the distribution of distances 
for RNA secondary structures of any fixed length, n, see Tab. [H 
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4. The singular expansion 



In this section we analyze the asymptotics of the nth coefficient, [z"']'W {z , u) . This 
will play a crucial role for the computation of the limit distribution of distances in 
Section [51 

Let us ffist establish some facts needed for deriving the singular expansion: 



Lemma 1. W{z,u) is algebraic over the rational function field C{z,u) and has 
the unique dominant singularity, p = (3 — \^)/2, which coincides with the unique 
dominant singularity ofS{z). 



Proof. The fact that W(z, u) is algebraic over the rational function C{z, u) follows 
immediately from Theorem [1] where we proved 

, uzHSiz)-!) z 

W (2, u) 



{I- zuf - {I- zu){zuY{S{z) -I) l-zu 



since evidently all nominators and denominators are polynomial expressions in u 
and z and 

(4.1) s(z) = 1-^ + ^'- V(^^ + ^ + i)(^^-37TT) _ 
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Thus the field C{z,u)[S{z)] is algebraic of degree two over C{z,u). The second 
assertion follows from u G (0, 1) and a straightforward analysis of the singularities 
of the two denominators (1 — zu)^ — (1 — zu){zuy{S{z) — 1) and (1 — zu). □ 



Given two numbers 0, r, where r > |k| and < < |, the open domain Ak(0, r) is 
defined as 

At,{(f),r) = {z \ \z\ <r,z k, |Arg(2 - > 0}. 

A domain is a A^-domain at k if it is of the form Ak(0, r) for some r and 0. A 
function is A^j-analytic if it is analytic in some A^j-domain. 



Suppose an algebrai c function has a unique singularity n. According to (IFlajolet and Sedgewick 



2009 



Stanley 



1980l ) such a function is Afj(0, r)-analytic. In particular, W(^, m) is 



Ap(0, r)-analytic. We introduce the notation 



(/(^) = o{g{z)) as z k) 



{f{z)/g{z) ^Qasz^K) 



and if we write f{z) = o {g{z)) it is implicitly assumed that z tends to the (unique) 
singularity. The following transfer theorem allows us to obtain the asymptotics of 
the coefficients from the generating functions. 



Theorem 2. /(Flajolet and Sedaewim . 



200m) Let f{z) he a A i^- analytic function at 



its unique singularity z = k. Let g{z) G {(k — 2)° | a G M}. Suppose we have in the 



intersection of a neighborhood of k, with the Ai^-domain 

f{z) = o{giz)) for z k. 

Then we have 

[z-]f{z) = o{[z-]g{z)). 



In addition, according to ( iFlajolet et al. 
(4.2) [z"] (1 - z)-" 



20051 ) we have for a G C \ Z 



<o- 



a-l 



r(o) 



2n 



We next observe W(2;, u) = h{z, u) f{g{z, u)), where g{z, u) = (uz^(S(z) — 1))/(1 — 
uz), f{z) = zjiX — uz), h{z,u) = 1/(1 — zu) and t{z,u) = uz'^/{l — uz). In 
preparation for the proof of Lemma [2] we set 

2{-2 + V5)u 



2 + (-3 + V5) 



u 



2 - (3 - V^)u 
df{w), 



f{cy) + 



r{p,u) 



dw 
2 



^ v^-1 dfiw) 
,t{p,u) - a 



dfiw) 

2 - (3 - V5)n dw 



dw 



--a + P 



8(3^5-5) 
-3 + V5)2 ■ 
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Furthermore, let v{z) and w{z) be D-finite power series such that w{0) = and let 
Py, pw denote their respective radius of convergence. We set Tu, = lim_^^^- w{z) and 
call the Z)-finite power series F{z) = v{w{z)) subcritical if and only if t^j < p^. 

Lemma 2. The singular expansion ofW{z,u) at its unique, dominant singularity 
p is given by 

(4.3) W(^, u) = Co + V(p, u) + r{p, u){p - zfl^ + 0{p - z). 



Proof. Since (7(0, u) = 0, the composition fXg{z, u)) is well defined as a formal power 
series and V(z, u) = as well as h{z, u) are regular at p. Since u G (0, 1) we have 
1/m > 1 > p, whence the dominant singularity of g{z,u) equals p. Next we observe 

, , u(l-p-p2) o.7n 0.35n 

()iP, U) = < — r = < 1, 

^ 2(1 - np) 2(l-0.4n) 1 - 0.4n 

whence f{g{z,u)) is governed by the subcritical paradigm. 
Claim 1. 

2 J 8{3V5 - 5){p - z) 
(4.4) g{z, u) = t{p, u) ^— - t(p, u) ^ ^_3^y^^2 + ~ 

To prove the Claim we consider the singular expansion of S{z) at p 



'8(3^5- 5) (p- ^) 
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The singular expansion of g{z, u) at p is obtained by multiplying the regular expan- 
sion of t{z,u) and singular expansion of S{z) — 1. Clearly, 

(4.6) t{z, u) = t{p, u) - ^%^|.=p {p-z) + 0{{p - zf), 

where t{p, u) = {7 - 3V5)u/{2 - (3 - V5)u). Thus 



(4.7) g{z, u) = t{p, u) j—^ - t{p, u) ^ (_3+^)2 + ' 

Setting a = g{p, u) = 2(— 2 + V5)u/ (2 + (—3 + \^)u), the regular expansion of f{w) 
at a is 

(4.8) f{w) = f{a) + ^U=„ {w-a)- 0{w - a), 

aw 

where ^U=„ = ( 2+(^3+^) J-2M+v^K ) ' ^"""^ accordingly 



c/fM J8{3V5-5){p- z) 
(4.9) /(^(., «)) = Ci - ^U=« t(p, «) ^ (_3^y5)2 + - 

where Ci = /(a) + ^^|^=„t(p,M) |^ - a Multiplying by the regular 

expansion of h{z,u) at p and adding the regular expansion of \{z,u) implies the 
lemma. □ 
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5. The limit distribution 



In this Section we shall prove that for any finite d holds 



(5.1) 



n-!>oo win 



w{n,d) 
lim ^ 

i^oo win) 



q{d). 



We furthermore determine the limit distribution via computing the power series 



d>l 

Theorem [3] below ensures that under certain conditions the point- wise convergence 
of probability generating functions implies the convergence of its coefficients. 

Theorem 3. Let u he an indeterminate and Q be a set contained in the unit disc, 
having at least one accumulation point in the interior of the disc. Assume Pn{u) = 
^^>Q p(n, (i)u°' and Q(u) = J2d>o^i^)'^^ ^'^^^ ^^^^ 

lim„_!.oo Pn{u) = Q{u) for each u ^ Q holds. Then we have for any finite d, 



(5.2) 




(5.3) 




and 




j<d 



j<d 



Let mi{u) = (—7 + 3a/5)m and 

m2{u) = -2 - 2(-3 + V5)u + (-15 + 7V5)u^ + (22 - 10V5)u^ + 2(-9 + 4:V5)u^. 



16 



Theorem 4. For any d > 1 holds 

w(n, d) 

(5.4) lim p(n, a) = lim -— = q(a), 

n— s>oo n^oo Win) 



where q((i) is given via the probability generating function Q(m) 

mi(u) 



(5.5) Q(m) 



m2[u) 



Proof. According to Lemma O the singular expansion of W(z, u) is given by 

(5.6) W{z,u) = Co + y{p,u) + r{p,u){p - zy/^ + 0{p - z). 
Thus 

(5.7) [z^]W{z, u) = r{p, u) [z^] (p - z)^/^ + [z''] 0{p - z). 
In view of 0{z — p) = o((z — p)^^^), Theorem |2] imphes 

(5.8) [2"]W(z, u) ~ r(p, u) [2"] (p - zY^^. 
Employing eq. (14. 2 p we obtain 

(5.9) [^"]W(z,m) ~r(p,M)J^n-3/2p-"(l + 0(-)), 



n 



for some constant K > 0. Substituting for r{p,u) we arrive at 

\z-m(z u) = ^ . 2V^6^-10 . Kn-'^'p--(l + 0(-)) 
i'i'^^'^'') m,{u) (-3 + v^)2 ^ + 
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and in particular for u = 1 

^ ^ ^ ^ (-3 + 75)2 ^ 
We consequently have 

[z'^]W{z,u) mi{u) 



(5.10) lim 



n^oo [z"']W(z,l) m2{u) 

Therefore, setting P„(n) = Xld pI"-; 

(5.11) lim P„(m) = Q(m). 

Since m G (0, 1), is an accumulation point of = (0, 1), and eq. f l5.10p holds for 
each u G fi, Theorem |3] implies for any finite d 

5.12 lim p n,(i) = lim = q d . 

n->-oo n->cx3 wfri) 

□ 



We finally compute the asymptotic expression of (\{d). For this purpose we recall 
that the density function of a r(A, r)-distribution is given by 



(5.13) /v(x) 



where A > and r > 0. 



r(A) 

0, X > 
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Corollary 1. Let p be the real positive dominant singularity of S{z) and set S = 

en 

That is, in the limit of large distances the coefficient q{d) is determined by the 
density function of a T (In 6, 2) -distribution. 



6. Discussion 

The results of this paper suggest that the number of base pairs alone is not sufficient 
to explain the distribution of 5'-3' distances. Surprisingly, we find that the 5'-3' 
distances of random are much smaller than those of mfe-structures, despite the fact 
that they contain a lesser number of base pairs, see Fig. [9l 

By definition, only irreducibles and isolated vertices contribute to the 5'-3' distance. 
The particular number of base p airs contained w i thin ir reducible substructures is 



irrelevant. It has been shown in (jjin and Reidys 



2010al ) that there exists a limit 



distribution for the number of irreducibles in random RNA secondary structures. 
This limit distribution is a determined by a F-distribution similar to Corollary [TJ 
As a result, random RNA secondary structures have only very few irreducibles, 
typically two or three. This constitutes a feature shared by RNA mfe-structures. 
Thus in case of random and mfe-structures a few irreducibles "cover" almost the 
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entire sequence since the 5'-3' distance is, even in the hmit of large sequence length, 
finite. The distinctively larger 5'-3' distance of mfe-structures consequently stems 
from the fact that their irreducibles cover a distinctively smaller fraction of the 
sequence. Hence the irreducibles of mfe-structures differ in a subtle way from those 
of random RNA structures. We show in the following that the shift of the 5'-3' 
distance is a combinatorial consequence of large stacks observed in mfe-structures, 
see Fig. El 

Here a stack of length r is a maximal sequence of "parallel" arcs, {i + 1, j — 

1), . . . , (z + (r — 1), j — (r — 1))). RNA secondary structures with stack length > r 
is called r-canonical RNA secondary structures. Let Wr{n,d) denote the number 
of r-canonical RNA secondary structures 0"^,^ having distance dn- We shall write d 
instead of d„ and consider 



the bivariate generating function of the number of RNA secondary structure with 
minimum stack-size r of length n having distance d and set Wr{n) = X]d>o Wr.(n, d). 
Let Sr{z) denote the generating function of r-canonical RNA secondary structures. 



(6.1) 




n>0 d>0 



Set 



(6.2) 



p,(z)=(z2'--(z-l)(r 



,2r 



,2r 
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Then the generating function of r-canonical secondary structures is given by 
(6.3) 



Sriz) 



2z 



2r 



and we can derive it using symbohc enumeration (IFlajolet and Sedgewick 



20091 ) 



Theorem 5. The bivariate generating function of the number of r-canonical RNA 
secondary structures of length n with distance d, is given by 



(6.4) W,(z,n) 



Mz^"(S,(z) - 1) 



+ 



;i - ZUY {1-Z'^ + Z^^) - (1 - ZU)U^ Z^'\Sr{z) - 1) I ~ ZU 



Along the hues of our analysis subsequent to Theorem [T] we can then obtain the 
singular expansion and the limit distributions for the 5'-3' distances of r-canonical 
RNA secondary structures, see Fig. [9l 
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d 

p{n,d) 


13 

8.22 X 10^3 


14 

5.19 X 10"3 


15 

3.17 X 10-3 


16 

1.86 X 10-3 


17 

1.05 X 10-3 


18 

5.62 X 10-"^ 


d 

p(n,d) 


19 

2.85 X lO"'^ 


20 

1.36 X 10"^ 


21 

5.99 X 10-^ 


22 

2.41 X 10-^ 


23 

8.58 X 10-6 


24 

2.63 X 10-6 


d 

pin,d) 


25 

6.56 X 10"'^ 


26 

1.24 X 10-^ 


27 

1.64 X 10-8 


28 

1.30 X 10-9 


29 

4.65 X 10-11 





Table 1. The distribution of distances of RNA secondary structures of 
length 30. The data of this table are represented in Fig. [3] as "+". 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

A B 



Figure 1. RNA secondary structures as diagrams: the backbone of the 
RNA molecule is drawn as a horizontal line and Watson-Crick base pairs are 
represented as arcs in the upper half-plane. An RNA secondary structure 
has no 1-arcs and only noncrossing arcs. 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 



Figure 2. The 5'-3' distance of RNA secondary structures: distance con- 
tributing backbone-edges and arcs are drawn in blue. The structure on the 
Ihs has 5'-3' distance 2 and structure on the rhs has 5'-3' distance 6. 
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Figure 3. The distribution of 5' -3' distances of RNA secondary struc- 
tures: We display the distribution of distances in RNA secondary structures 
of length 30 (+) derived via Theorem[TJ We furthermore show the distribu- 
tion of distances in the limit of long RNA secondary structures (•) obtained 
via Theorem m 

+ □ +□ -n -n (j) (p +□ +□ +□ (j) -□ -□ -□ 

(^□□□□□□<^00 am cm cm m □ <^ ^ 

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 

Figure 4. A l-tableaux: at each step either nothing happens or a single 
□ is added or removed. 
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1 2 3 4 5 6 7 



9 10 11 12 13 14 



+ □ +□ +□ -□ -□ -□ (f) +□ +□ +□ -□ -□ -□ 



1 2 



5 6 7 



10 11 12 13 14 



Figure 5. Mapping RNA secondary structures into 1-tableaux. 




1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

(f) □mmn <^*<^ □□□□□□ □□□□□□□ 0*0 
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Figure 6. A secondary structure and its a 1-tableaux: its 5'-3' distance 
equals twice the number of 0* plus the number of shapes, i.e. 2 x 2+4 = 8. 



□ □ m cm cm m □ □ m m m 

123 4 5 6789 10 11 



□ 



12 13 



1 2 3 4 5 6 7 8 9 10 11 12 13 



y 



[J CD \m u <t> (j) u □□ □ 

1 23 4 56789 10 11 12 



1 2 34 5 67 8 9 10 11 



Figure 7. The mappings 7 and 7*. 
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Figure 8. The 5'-3' distance of random structures and mfe-structures: 
We display RNA secondary structures of length 30 and the limit distri- 
bution (•) as well as a sample of 5000 mfe-structures obtained from random 
sequences of length 100 (o). 
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Figure 9. The 5'-3' limit distance distribution of r-canonical RNA struc- 
tures and mfe-structures: We display limit distance distribution of r- 
canonical RNA structures of length 45: (gray line: r = 1), (cyan line: 
r=3), (orange line: r=5), (green line: r=10) as well as a sample of 10000 
mfe-structures obtained from random sequences of length 100 (black line). 



